Strandbox¶

Dataset¶

Dataset consist of scientific articles from 3 different journals:

  1. Environmental Innovation and Societal Transitions (EIST)
  2. Research in the Sociology of Organizations (RSOG)
  3. Sustainability Science (SusSci)
# Articles before preprocessing # Articles after preprocessing
EIST 683 574
RSOG 659 639
Sus-Sci 1191 1121
In [1]:
import json
import pandas as pd
In [2]:
data_path = 'data/extract_EIST.json'
with open(data_path, 'r') as fd:
    data = json.load(fd)
df = pd.DataFrame(data).T
df.head()
Out[2]:
file_name doi title abstract text location year authors
1 -It-s-not-talked-about---The-risk-of-failure-_... 10.1016/j.eist.2020.02.008 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... {'Introduction': ' A transition away from the ... UK 2020 [Beck Collins]
2 -Making-energy-transition-work---Bricolage-_20... 10.1016/j.eist.2020.07.005 “Making energy transition work”: Bricolage in ... In the quest for energy transition pathways, e... {'Introduction': ' Local energy transitions ha... Austria 2020 [Johannes Suitner, Martha Ecker, T U Wien]
3 1-s2.0-S2210422419302618-main 10.1016/j.eist.2019.10.005 Thinking about individual actor-level perspect... The 2019 STRN research agenda identifies conne... {'Introduction: background and rationale': ' T... Germany 2020 [Paul Upham, Paula Bögel, Elisabeth Dütschke]
4 1-s2.0-S2210422419302850-main 10.1016/j.eist.2019.11.008 Not more but different: A comment on the trans... The sustainability transitions research networ... {'Introduction': ' The comprehensive agenda fo... UK 2020 [Debbie Hopkins, Johannes Kester, Toon Meelen,...
5 1-s2.0-S2210422420300277-main 10.1016/j.eist.2020.02.001 Let's focus more on negative trends: A comment... Much has been written on sustainability transi... {'Introduction': ' The analysis of sustainabil... UK 2020 [Miklós Antal, Giulio Mattioli, Imogen Rattle,...
In [46]:
df.loc[1,'abstract']
Out[46]:
'Scholars of sustainability transition have given much attention to local experiments in ‘protected spaces’ where system innovations can be initiated and where learning about those innovations can occur. However, local project participants’ conceptions of success are often different to those of transition scholars; where scholars see a successful learning experience, participants may see a project which has failed to “deliver”. This research looks at two UK case studies of energy retrofit projects – Birmingham Energy Savers and Warm Up North, both in the UK, and the opportunities they had for learning. The findings suggest that perceptions of failure and external real world factors reducing the capacity to experiment, meant that opportunities for learning were not well capitalised upon. This research makes a contribution to the sustainability transitions literature which has been criticised for focusing predominantly on successful innovation, and not on the impact of failure. © 2020 Elsevier B.V.'
In [3]:
data_path = 'data/prepro_EIST.json'
with open(data_path, 'r') as fd:
    data = json.load(fd)
df = pd.DataFrame(data)
In [4]:
df.head()
Out[4]:
title abstract text id time class
0 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... scholar sustainability transition given attent... 0 2020 EIST
1 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... transition use fossil fuel heat power ineffici... 0 2020 EIST
2 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... provide obstacle transfer learning following h... 0 2020 EIST
3 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... multi level perspective socio technical transi... 0 2020 EIST
4 “It's not talked about”: The risk of failure i... Scholars of sustainability transition have giv... research based comparative case study local ex... 0 2020 EIST
In [47]:
df['text'][0]
Out[47]:
'scholar sustainability transition given attention local experiment protected space system innovation initiated learning innovation occur local project participant conception success different transition scholar scholar successful learning experience participant project failed deliver research look case study energy retrofit project birmingham energy saver warm north uk opportunity learning finding suggest perception failure external real world factor reducing capacity experiment meant opportunity learning capitalised research make contribution sustainability transition literature criticised focusing successful innovation impact failure'

Topic modelling¶

1. LDA Optimal number of topics¶

In [5]:
import matplotlib.pyplot as plt
import matplotlib.image as mpimg

img = mpimg.imread('disp/3journals_optim.png')
plt.imshow(img)
plt.axis('off')
Out[5]:
(-0.5, 575.5, 431.5, -0.5)

2. Topic Network¶

In [63]:
from IPython.display import IFrame

IFrame(src='disp/topic_network_3_journals_antons.html', width=900, height=700)
Out[63]:

2.1 Network Centrality¶

In [11]:
df = pd.read_csv('disp/centrality_full_network.csv')
df.sort_values(by=['Degree Centrality'], ascending=False).head(7)
Out[11]:
Unnamed: 0 Topic Degree Centrality Degree per Article Betweenness Centrality Betweenness per Article Clustering
0 0 0_interview_data_conducted_participant 59.430657 0.716032 2213.512033 26.668820 0.130918
5 5 5_transformation_actor_change_transition 38.277372 0.797445 806.733079 16.806939 0.180654
9 9 9_emission_scenario_reduction_carbon 36.262774 0.614623 936.007673 15.864537 0.163492
2 2 2_complexity_system_approach_process 29.211679 0.561763 470.744724 9.052783 0.182266
137 137 137_game_approach_process_change 25.182482 1.144658 213.092696 9.686032 0.303333
4 4 4_area_land_water_scenario 23.167883 0.413712 288.320436 5.148579 0.245059
15 15 15_forest_land_scenario_deforestation 23.167883 0.772263 252.689720 8.422991 0.245059

2.2 Topic Co-occurrence Distribution¶

In [2]:
import plotly.express as px
df = pd.read_csv('disp/edge_weight_dist_full_network.csv')
fig = px.pie(df, values='%', names='Edge Weight')
fig.show()

3. Topics Landscape¶

In [15]:
df = pd.read_csv('disp/3_journals_topics.csv')
df = df.rename(columns={'Volumne': 'Volume'})
df.head()
Out[15]:
Unnamed: 0 Topic Label topic_nr most_freq_words rep_doc_year title Volume Authors
0 1 0_interview_data_conducted_participant 0 ['interview', 'data', 'conducted', 'participan... 2020 Sharing among neighbours in a Norwegian suburb 37 Westskog H., Aase T.H., Standal K., Tellefsen S.
1 2 1_university_student_school_stanford 1 ['university', 'student', 'school', 'stanford'... 2010 Chapter 23: The Stanford organizational studie... 28 Meyerson D.E.
2 3 2_complexity_system_approach_process 2 ['complexity', 'system', 'approach', 'process'... 2020 SHIFT IN HYBRIDITY IN RESPONSE TO ENVIRONMENTA... 69 Ramus T., Vaccaro A., Versari P., Brusoni S.
3 4 3_sustainability_research_student_science 3 ['sustainability', 'research', 'student', 'sci... 2021 The patterns of curriculum change processes th... 16.0 Weiss M., Barth M., von Wehrden H.
4 5 4_area_land_water_scenario 4 ['area', 'land', 'water', 'scenario', 'forest'... 2019 The seasonal and scale-dependent associations ... 14.0 Aiba M., Shibata R., Oguro M., Nakashizuka T.

4. Topics vs Documents¶

In [67]:
plt.figure(figsize = (35,30))
img = mpimg.imread('disp/3_journals.png')
plt.imshow(img, aspect='auto')
plt.axis('off')
Out[67]:
(-0.5, 1999.5, 1499.5, -0.5)

5. Hierarchical Plots¶

EIST Topics¶

In [15]:
from IPython.display import IFrame

IFrame(src='disp/hierarchical_topics_eist.html', width=1000, height=1000)
Out[15]:

RSOG Topics¶

In [17]:
from IPython.display import IFrame

IFrame(src='disp/hierarchical_topics_rsog.html', width=1000, height=1000)
Out[17]:

Sus-Sci Topics¶

In [18]:
from IPython.display import IFrame

IFrame(src='disp/hierarchical_topics_sus_sci.html', width=1000, height=1200)
Out[18]:

Combined 3 journals¶

In [10]:
from IPython.display import IFrame

IFrame(src='disp/hierarchical_topics_3_journals.html', width=1000, height=1400)
Out[10]:

Insights¶

1. Descriptive Statistics¶

In [68]:
df = pd.read_csv('disp/descriptive_stats_3_journals.csv')
df.head()
Out[68]:
Unnamed: 0 Topic Label standardized_mean max min
0 0 0_interview_data_conducted_participant 3.637 0.560 0.0
1 1 1_university_student_school_stanford 1.789 0.951 0.0
2 2 2_complexity_system_approach_process 1.542 0.630 0.0
3 3 3_sustainability_research_student_science 1.447 0.863 0.0
4 4 4_area_land_water_scenario 3.686 0.854 0.0

2. Temporal Trajectory¶

In [17]:
df = pd.read_csv('disp/3_journals_temp_dev_trajc.csv')
df.head()
Out[17]:
Unnamed: 0 topic_label count year_mean year_std year_min year_max coeff_linear coeff_quadratic coeff_linear_of_quadratic
0 0 0_interview_data_conducted_participant 83 2015.153846 5.446893 2001 2022 0.7577 0.0673 0.7577
1 1 1_university_student_school_stanford 17 2015.500000 3.905125 2010 2021 -0.8770 0.1167 -0.8770
2 2 2_complexity_system_approach_process 52 2016.083333 4.050892 2009 2022 0.4452 0.0382 0.4452
3 3 3_sustainability_research_student_science 36 2015.692308 4.120952 2009 2022 0.2857 0.0407 0.2857
4 4 4_area_land_water_scenario 56 2015.076923 4.730613 2007 2022 0.4114 0.0731 0.4114
In [21]:
import numpy as np
df['year_mean'] = np.around(df['year_mean'], 3)
df['year_std'] = np.around(df['year_std'], 3)
In [27]:
df.drop(['coeff_linear_of_quadratic'], axis=1).to_csv('3_journals_temp_dev_trajc.csv')

3. Temporal Trends¶

3.1. Hot topics¶

In [70]:
from IPython.display import IFrame

IFrame(src='disp/3_journals_hot.html', width=1000, height=600)
Out[70]:

3.2 Cold Topics¶

In [74]:
from IPython.display import IFrame

IFrame(src='disp/3_journals_cold.html', width=1000, height=600)
Out[74]:

3.3. Reviving Topics¶

In [72]:
from IPython.display import IFrame

IFrame(src='disp/3_journals_reviving.html', width=1000, height=600)
Out[72]:

3.4. Evergreen Topics¶

In [71]:
from IPython.display import IFrame

IFrame(src='disp/3_journals_evergreen.html', width=1000, height=600)
Out[71]:

3.5. Wallflower Topics¶

In [73]:
from IPython.display import IFrame

IFrame(src='disp/3_journals_wallflowers.html', width=1000, height=600)
Out[73]:

Author Discourse Network Mapping¶

In [53]:
from IPython.display import IFrame

IFrame(src='disp/continous_color.html', width=900, height=600)
Out[53]:

1. Author Colaboration Network¶

In [42]:
from IPython.display import IFrame

IFrame(src='disp/colab_network.html', width=1000, height=600)
Out[42]:

2. Author Colaboration within their own community¶

In [43]:
from IPython.display import IFrame

IFrame(src='disp/within_form_colab_network.html', width=1000, height=600)
Out[43]:

3. Author Colaboration outside their community¶

In [44]:
from IPython.display import IFrame

IFrame(src='disp/outside_form_colab_network.html', width=1000, height=600)
Out[44]:

4. Discovering new discourse community¶

In [45]:
from IPython.display import IFrame

IFrame(src='disp/new_discourse.html', width=1000, height=600)
Out[45]:

5. Colaboration with new discourse Community¶

In [46]:
from IPython.display import IFrame

IFrame(src='disp/network_with_new_cluster.html', width=1000, height=600)
Out[46]:

6. Interstitial Community¶

In [47]:
#(0.33+-0.02, 0.33+-0.02, 0.33+-0.02)

from IPython.display import IFrame

IFrame(src='disp/network_with_interstitial_cluster.html', width=1000, height=600)
Out[47]:

Triad mapping based on Scientific, Associational and Managerial (Powell et. al 2017)¶

In [ ]:
### List of Users: InnoEnergyEU, EUeic, EITUrbanMob, EITRawMaterials, EITManufactur, EITHealth, EITFood, EITeu, 
### EIT_Digital, ClimateKIC
In [21]:
from IPython.display import IFrame

IFrame(src='disp/EIT_Digital.html', width=700, height=600)
Out[21]:
In [ ]:
 

Vocab for State, Market and Community¶

In [6]:
state_vocab = """Sovereignty, Constitution, National Security, Foreign Relations, Diplomacy, International Law, Human Rights, Civil Liberties, Public Services, Infrastructure, Public Health, Public Safety, Social Security, Social Welfare, Public Education, Public Transportation, Taxation, Fiscal Policy, Regulatory Framework"""
state_vocab = [word.strip() for word in state_vocab.split(',') if len(word)>0]
state_vocab
Out[6]:
['Sovereignty',
 'Constitution',
 'National Security',
 'Foreign Relations',
 'Diplomacy',
 'International Law',
 'Human Rights',
 'Civil Liberties',
 'Public Services',
 'Infrastructure',
 'Public Health',
 'Public Safety',
 'Social Security',
 'Social Welfare',
 'Public Education',
 'Public Transportation',
 'Taxation',
 'Fiscal Policy',
 'Regulatory Framework']
In [7]:
market_vocab = """Profits, Competition, Consumers, Supply Chain, Distribution, Pricing, Mergers & Acquisitions, Outsourcing, Globalization, Innovation, Technology, Intellectual Property, Risk Management, Branding, Advertising, Market Research, Market Share, Market Segmentation, Market Trends, Market Analysis"""
market_vocab = [word.strip() for word in market_vocab.split(',') if len(word)>0]
market_vocab
Out[7]:
['Profits',
 'Competition',
 'Consumers',
 'Supply Chain',
 'Distribution',
 'Pricing',
 'Mergers & Acquisitions',
 'Outsourcing',
 'Globalization',
 'Innovation',
 'Technology',
 'Intellectual Property',
 'Risk Management',
 'Branding',
 'Advertising',
 'Market Research',
 'Market Share',
 'Market Segmentation',
 'Market Trends',
 'Market Analysis']
In [8]:
community_vocab = """Volunteers, Charities, Social Groups, Local Organizations, Non-Governmental Organizations, Faith-Based Organizations, Community Centers, Neighborhoods, Clubs, Activists, Advocates, Social Movements, Social Enterprises, Social Networks, Social Media, Fundraisers, Donors, Philanthropy, Collaboration, Empowerment, Inclusion, Social Justice"""
community_vocab = [word.strip() for word in community_vocab.split(',') if len(word)>0]
community_vocab
Out[8]:
['Volunteers',
 'Charities',
 'Social Groups',
 'Local Organizations',
 'Non-Governmental Organizations',
 'Faith-Based Organizations',
 'Community Centers',
 'Neighborhoods',
 'Clubs',
 'Activists',
 'Advocates',
 'Social Movements',
 'Social Enterprises',
 'Social Networks',
 'Social Media',
 'Fundraisers',
 'Donors',
 'Philanthropy',
 'Collaboration',
 'Empowerment',
 'Inclusion',
 'Social Justice']
In [ ]: